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Abstract 

Combining different prior distributions is an important issue in decision theory and Bayesian 
inference. Logarithmic pooling is a popular method to aggregate expert opinions by using a set 
of weights that reflect the reliability of each information source. The resulting pooled distribution 
however heavily depends set of weights given to each opinion/prior. In this paper we explore three 
objective approaches to assigning weights to opinions. Two methods are stated in terms of opti¬ 
mization problems and a third one uses a hierarchical prior that accounts for uncertainty on the 
weights. We explore an example in which a proportion is estimated using weights assigned to a finite 
set of opinions and show that, depending on the used method, results vary from discarding some of 
the expert opinions to the situation in which all opinions are assigned equal weights. Nevertheless, 
the three methods explored in this paper lead to very similar combined priors, with very similar 
integrated (marginal) likelihoods. 

Key-words: logarithmic pooling; expert opinion; maximum entropy; Kullback-Liebler divergence; 
Dirichlet prior. 


Background 

Combining probability distributions is a topic of general interest, both in the statistical (Genest 
et ah, 1986; Genest and Zidek, 1986) and decision theory literatures (Genest et ah, 1984). On the 
theoretical front, studying opinion pooling operators may give important insights on consensus belief 
formation and group decision making (Genest and Zidek, 1986). Among the various opinion pooling 
operators proposed in the literature, logarithmic pooling has enjoyed much popularity, mainly due to 
its many desirable properties such as relative propensity consistency (RPC) and external Bayesianity 
(EB) (Genest et ah, 1986). In a practical setting, logarithmic pooling finds use in a range of fields, from 
infectious disease modelling (Coelho and Codego, 2009) and wildlife conservation (Poole and Raftery, 
2000) to engineering (Lind and Nowak, 1988; Savchuk and Martz, 1994). 

A common situation of interest is that of combining expert opinions, represented as proper probability 
distributions, about a quantity of interest 0 € @ C M”. To combine these opinions using logarithmic 
pooling requires assigning weights to each of the experts. These weights represent the reliability of each 
opinion (Genest et ah, 1984). This requirement naturally leads to the question of how to choose the 
weights in a meaningful fashion, according to some well-accepted optimality criterion. There are a few 
proposals in the literature that build methods using different approaches. One proposal is to maximise 
the entropy the pooled distribution (Myung et ah, 1996), whereas another one is to minimise Kullback- 
Liebler (KL) divergence between the pooled distribution and the individual opinions (Abbas, 2009) or 
between the pooled (prior) distribution and the posterior distribution (Rufo et ah, 2012a, b). 

These approaches, while moving away from the problem of arbitrarily assigning the weights, arrive at 
single point solutions, similar to point estimates in Statistical theory. Albeit acknowledging that these 
approaches have merit, we argue that in many settings, where one has substantial prior information on 
the relative reliabilities of the information sources (experts), it would be desirable to incorporate this 
information into the pooling procedure while accommodating uncertainty about the weights (Poole and 
Raftery, 2000). Moreover, assigning a probability distribution to the weights permits us to obtain a 
posterior distribution using a Bayesian procedure, which in turn enables us to learn about these weights. 
Therefore, it makes possible to sequentially update knowledge about the reliability of each expert/source 
in the face of new data. 

In this paper we explore previous approaches for deriving the weights for logarithmic pooling, namely 
by maximising the entropy of the resulting distribution and minimising the KL divergence between 
the pooled distribution and each individual distribution. Additionally, we propose a hierarchical prior 
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approach in which we place a Dirichlet prior on the weights. We present an example on proportion 
estimation by combining Beta priors. 

In what follows, we introduce the necessary theory and extend a previous result (Poole and Raftery, 
2000) for combining more than two distributions. 

Let F(0) = {fo{9), fi{0), f 2 { 6 ), ■ ■ ■, be the set of prior distributions representing the opinions 

oi K + \ experts and let cx = {ao, ai,a 2 , ■ ■ ■, ax} be the vector of weights, such that > 0 Vi and 
= 1- Then the log-pooled prior is 

K 

7r{e) = t{cx)l[Mer ( 1 ) 

where t{a) = llf=o M&T'dO. 

Logarithmic pooling will only yield proper probability distributions if it is possible to normalise the 
expression in (1). This condition is usually assumed implicitly, without proof. Poole and Raftery (2000) 
provide a proof for the case of two densities (see Theorem 1 therein), which we extend for the case of a 
finite number of densities. 

Theorem 1. Let A be the {K + 1)-dimensional open simplex on [0,1]. For all a. € A there exists a 
constant t{a) such that J^Tr(9)d0 = 1. 

Here we provide a simple proof using Holder’s inequality. 


Proof. We begin by noting that 7r(0) can be re-written as: 




( 2 ) 


Let Xj = , j = 1, 2,..., iV. Then integrating the expression in (2) is equivalent to finding 


Ef) 


K 


nv 


K 

i=i 


(3) 


where Lio)-] is the expectation w.r.t /o and (3) follows from Holder’s inequality for expectations (Yeh, 
2011). Since we have, Vj, ~ (/© ~ ~ Theorem 1 is proven. □ 

We now move on to study three approaches to assign weights, the first two approaches based on 
optimality criteria and a proposal based on pooling Dirichlet prior distributions. 


Choosing the weights based on optimality criteria 

Maximum entropy 

In a context of near complete uncertainty about the relative reliabilities of the experts (information 
sources) it may be desirable to combine the prior distributions such that 7r(0) is maximally uninformative. 
Such approach would ensure that, given the constraints imposed by F(0), the pooled distribution is the 
one which best represents the current state of knowledge (Jaynes, 1957; Savchuk and Martz, 1994). In 
order to choose a so as to maximise prior diffuseness, one can maximise the entropy of the log-pooled 
prior: 

H^{e) = E^ [-ln7r(6')] = - f TT{e)lmr{9)de (4) 

J& 

In some cases it may be useful to express (9) as 

K 

H^{e]OL) = Ji{e)] - Int(a) (5) 

Formally, we want to find a such that 

a := argmaxiL.n.(0; a) (6) 

This approach, however, does not result in a convex optimisation problem, therefore one is not guar¬ 
anteed to find a unique solution. See Proposition 1, below, for intuition as to why. 
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Minimising Kullback-Liebler divergence 

One could also want to choose the pooling weights so as to minimise the total Kullback-Liebler 
divergence between each proposed distribution and the pooled distribution. Let di = KL(/i||7r) and let 
L{a.) be a loss function such that 


K 

L(a)=^d, (7) 

K K 

= —K\nt{a.) + EE«^KL(/,||/,) (8) 

i=0 j^i 

a := argminL(Q!) (9) 

Proposition 1. The distribution obtained following (9) is unique, i.e., there is only one aggregated prior 
7r(0) that minimizes L{a). 

This property is proven in Rufo et al. (2012a). One can get some intuition into the proof of this 
claim by noting that minimising (8) is equivalent to maximising lnt(Q;) = In /0 Ilto 
et al. (2012a) show that t{a.) is concave, therefore the problem in (9) has a unique solution. By contrast, 
the problem in (6) requires to minimise lnt(Q:) hence lacking a sufficient condition for the existence of a 
unique solution. 


Specifying a prior distribution for ol 


In this section we propose a hierarchical prior for 9 conditional on a. in order to incorporate uncertainty 
on the weights. A natural choice for a prior distribution for a is the {K -f 1)—dimensional Dirichlet 
distribution. The conditional distribution 'k{9\ol) is of the form in (1) and the prior density for a is 


7r(a) 


1 

B{X) 


K 




( 10 ) 


where X = ... ,xk} is the vector of hyperparameters for the Dirichlet prior and B{X) is the 

multinomial beta function. The marginal prior for 9 is then 

7r(6l) = f ‘K{9\<y.)'K{a)doL (11) 

Ja 

= ^ / t{^)\{Wr'c^T-"dcy ( 12 ) 


Application: binomial probabilities 

We now turn our attention to combining expert opinions about probabilities and proportions. In this 
setting we are interested in the random variable Y ~ Bernoulli(9). Again let us assume that we want to 
obtain a combined prior for a proportion 9. A common choice for F(0) is the Beta family of distributions: 

The log-pooled prior is then 


K 


l[M0;a,,b,r' 

(13) 

i^O 

J|(6)«.-i(l_6))b.-i)“‘ 

(14) 

0“*-i(I_6»)'>*-i 

(15) 
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with a* = J2f-o ~ Again, (15) is the kernel of a Beta distribution with parameters 

a* and b*, thus 

H^{0) = \nB{a*,b*) - {a* - l)^/>(a*) - {b* - l)'tpib*) + {a* + b* - 2)?/'(a* + b*) (16) 

For the beta family of distributions, the KL divergence between fi{0) and tt{9) is 

di = KLifiWn) = In ^ + (a^ - a*)ip(a,) + (bi - b*)'ip{bi) 

\B{ai,bi) J (17) 

+ (a* - ai + b* - bi)ip{ai + bi) 

The marginal prior for 9 is 

L ( 18 ) 

which can also be efficiently approximated through Monte Carlo sampling. We provide a simple imple¬ 
mentation using the Stan (Stan Development Team, 2014) probabilistic programming language at https; 
//github. com/maxbiostat/opinion_pooling. R code for the methods, figures and tables presented in 
this paper can also be found at the above link. 

Here we analyse an example proposed by Savchuk and Martz (1994) (also discussed in Rufo et al. 
(2012b)) in which four experts are required supply prior information about the survival probability of 
a certain unit for which there have been y = 9 successes out of n = 10 trials. The experts express 
their opinion as prior means for the survival probability, which Savchuk and Martz (1994) then use 
to construct prior distributions with maximum variance given the restriction on the means. From the 
vector of prior means m = {toq = 0.95, mi = 0.80, m2 = 0.90, m3 = 0.70}, the authors obtain the 
parameters of the beta distributions for each expert, a = {oq = 18.10, oi = 3.44,02 = 8.82,03 = 1.98} 
and b = {b^ = 0.955, bi = 0.860, 62 = 0.924, bs = 0.848}. The resulting prior densities are show in the 
top panel of Figure 1. To complete the analysis, we place a diffuse Dirichlet{cy.\X) prior on a with 
Xi = 1/4 Vi. Finally, we propose to compare the prior distributions representing the experts’ opinions as 
well as the combined distributions obtained by the different approaches using the integrated (marginal) 
likelihood (Raftery et al. (2007), eq. 9), l{y) = f{9\x)TT{9)d9. 


Results and avenues of future research 

Table 1 lists the weights proposed by each method. Figure 1 shows the prior and posterior distributions 
in each of the methods and also the case in which we assign an equal weight {1/K) to each opinion. It 
is interesting to note that maximum entropy suggests to discard all opinions but one, which effectively 
leads to the maximum entropy. Since t{a.) is concave, we expect to find the maximum entropy given by 
the boundary conditions, which may lead to border points in the simplex. Minimising Kullback-Liebler 
divergence between each prior and the pooled prior leads to finding a unique solution but in this case also 
suggests to discard two of the opinions. By contrast, using a hierarchical Dirichlet prior for the weights 
gives rather different results from the first two methods in proposing almost equal weights to each of the 
opinions. One can get insight into these results by looking at the integrated likelihoods in Table 2 and 
the densities in Figure 1, we note that all three methods lead to similar pooled distributions. Note that 
the only distribution with a substantially different l{y) is that of Expert 3, who gave a rather divergent 
mean for the survival probability (m3 = 0.70). 

In conclusion, if the prior distributions (opinions) are not radically different, all three methods will 
probably lead to similar combined priors. Although this is the case for the simple univariate example 
presented, it remains to be seen if this is the case for high-dimensional 9 under complex sampling dis¬ 
tributions. As the results presented in this paper make clear, future research shall be focused on cases 
where there is substantial heterogeinity in the available opinions. Moreover, a sensitivity analysis for 
n(ck) is desirable to understand how much we can lead about the experts reliabilities a posteriori. 
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Table 1: Weights obtained using the three methods for the proportion estimation problem. ^ - Kullback- 
Liebler ^ - Posterior mean for a.. _ 


Method 

ao 

ai 

<22 

(23 

Maximum entropy 

0.00 

1.00 

0.00 

0.00 

Minimum KL^ divergence 

0.04 

0.96 

0.00 

0.00 

Hierarchical prior^ 

0.26 

0.24 

0.26 

0.23 


Table 2: Integrated likelihoods {l{y)) for the priors of each expert as well as the combined priors. ^ 
Calculated using the posterior mean of ol 


Expert priors 

Pooled priors 


Expert 0 

0.237 

Equal weights 

0.254 

Expert 1 

0.211 

Maximum entropy 

0.211 

Expert 2 

0.256 

Minimum KL 

0.223 

Expert 3 

0.163 

Hierarchical 

0.255 
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Figure 1: Prior and posterior densities for 0. Top panel shows the distributions elicited by each 
expert (data from Savchuk and Martz (1994)) and the bottom panel shows the pooled priors and posteriors 
obtained using each of the three methods discussed in this paper. The dashed vertical line marks the 
maximum likelihood estimate of 9, 9 = 9/10. 
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